{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/training/tensorflow/accelerate_tensorflow_training_multi_instance.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Accelerate TensorFlow Keras Training using Multiple Instances"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "BigDL-Nano provides `bigdl.nano.tf.keras.Model` and `bigdl.nano.tf.keras.Sequential` which extend `tf.keras.Model` and `tf.keras.Sequential` separately with various optimizations. To use multi-instance training on a server with multiple CPU cores or sockets, you just replace `tf.keras.Model`/`Sequential` in your code with `bigdl.nano.tf.keras.Model`/`Sequential`, and call `fit` with specified `num_processes`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "To use multiple instances for TensorFlow Keras training, you need to install BigDL-Nano for TensorFlow:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# install the nightly-built version of bigdl-nano for tensorflow;\n",
    "# intel-tensorflow will be installed at the meantime with intel's oneDNN optimizations enabled by default\n",
    "!pip install --pre --upgrade bigdl-nano[tensorflow]\n",
    "!source bigdl-nano-init  # set environment variables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📝 **Note**\n",
    ">\n",
    "> Before starting your TensorFlow Keras application, it is highly recommended to run `source bigdl-nano-init` to set several environment variables based on your current hardware. Empirically, these variables will bring big performance increase for most TensorFlow Keras applications on training workloads."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "> ⚠️ **Warning**\n",
    "> \n",
    "> For Jupyter Notebook users, we recommend to run the commands above, especially `source bigdl-nano-init` before jupyter kernel is started, or some of the optimizations may not take effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# install dependency for the dataset used in the following example\n",
    "!pip install tensorflow-datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, **import** `Model` **or** `Sequential` **from** `bigdl.nano.tf.keras` **instead of** `tf.keras`. Let’s take the `Model` class here as an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# from tf.keras import Model\n",
    "from bigdl.nano.tf.keras import Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose we would like to train a [ResNet50 model](https://keras.io/api/applications/resnet/#resnet50-function) (pretrained on ImageNet dataset) on the [imagenette](https://www.tensorflow.org/datasets/catalog/imagenette) dataset, we need to create the corresponding train/test datasets, and define the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# Define train/test datasets creator, and the model inputs outputs\n",
    " \n",
    "import tensorflow as tf\n",
    "import tensorflow_datasets as tfds\n",
    "\n",
    "def create_datasets(img_size, batch_size):\n",
    "    (train_ds, test_ds), info = tfds.load('imagenette/320px-v2',\n",
    "                                          data_dir='/tmp/data',\n",
    "                                          split=['train', 'validation'],\n",
    "                                          with_info=True,\n",
    "                                          as_supervised=True)\n",
    "    \n",
    "    num_classes = info.features['label'].num_classes\n",
    "    \n",
    "    def preprocessing(img, label):\n",
    "        return tf.image.resize(img, (img_size, img_size)), \\\n",
    "               tf.one_hot(label, num_classes)\n",
    "\n",
    "    train_ds = train_ds.repeat().map(preprocessing).batch(batch_size)\n",
    "    test_ds = test_ds.map(preprocessing).batch(batch_size)\n",
    "    return train_ds, test_ds, info\n",
    "\n",
    "\n",
    "from tensorflow.keras import layers\n",
    "from tensorflow.keras.applications import ResNet50\n",
    "\n",
    "def define_model_inputs_outputs(num_classes, img_size):\n",
    "    inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3))\n",
    "    x = tf.cast(inputs, tf.float32)\n",
    "    x = tf.keras.applications.resnet50.preprocess_input(x)\n",
    "    backbone = ResNet50(weights='imagenet')\n",
    "    backbone.trainable = False\n",
    "    x = backbone(x)\n",
    "    x = layers.Dense(512, activation='relu')(x)\n",
    "    outputs = layers.Dense(num_classes, activation='softmax')(x)\n",
    "\n",
    "    return inputs, outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create train/test datasets\n",
    "train_ds, test_ds, ds_info = create_datasets(img_size=224, batch_size=32)\n",
    "\n",
    "# Model creation steps are the same as using tf.keras.Model\n",
    "inputs, outputs = define_model_inputs_outputs(num_classes=ds_info.features['label'].num_classes, \n",
    "                                              img_size=224)\n",
    "\n",
    "model = Model(inputs=inputs, outputs=outputs)\n",
    "model.compile(loss=\"categorical_crossentropy\", optimizer=\"adam\", metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; _The definition of_ `create_datasets` _and_ `define_model_inputs_outputs` _can be found in the_ [runnable example](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/training/tensorflow/accelerate_tensorflow_training_multi_instance.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You could then **call the** `fit` **method with** `num_processes` **set to an integer larger than 1** to launch the specific number of processes for data-parallel training:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.fit(train_ds,\n",
    "          epochs=10,\n",
    "          steps_per_epoch=(ds_info.splits['train'].num_examples // 32),\n",
    "          num_processes=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📝 **Note**\n",
    ">\n",
    "> By setting `num_processes`, CPU cores will be automatically and evenly distributed among processes to avoid conflicts and maximize training throughput.\n",
    "> \n",
    "> During Nano TensorFlow Keras multi-instance training, the effective batch size is still the `batch_size` specified in datasets (32 in this example). Because we choose to match the semantics of TensorFlow distributed training (`MultiWorkerMirroredStrategy`), which intends to split the batch into multiple sub-batches for different workers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 📚 **Related Readings**\n",
    "> \n",
    "> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/install.html)\n",
    "> - [How to choose the number of processes for multi-instance training](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/Training/General/choose_num_processes_training.html)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.13 ('nano-tf': conda)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.7.13"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "402532f56d486e9f832908f31130bbdf12bd8cb099dfb226783aa2c6b1479100"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
